cmis170fandomcom-20200214-history
CMIS170 Wiki
'Welcome to the CMIS170 Wiki' XML and Data - Final Project for XML Course Authors: Acevedo, Dunn, Peet Additional pages: * XML and Relational Databases * References * Contributions 'XML and Data' Introduction The application of the Extensible Markup Language in conjunction with databases is a topic that is further discussed. An alternate way of extracting and storing data shows the relation between XML technologies and databases features. To better understand these two entities let us dissect and explain databases and XML. Databases Database in computer information technology is a compilation of information, or stored data that is later retrieved. (Richardson, 2015) Divides databases into five categories of features. These are: · Data Model – Relational model, Key Value Model, Hierarchical model · Application Programming Interface (API) – In process vs. Out of process; SQL vs. NoSQL · Transaction – Atomicity, Consistency, Isolation, Durability · Persistent – Row-based, Column-based, Memory only, Distributed · Indexing – B- tree, B+ tree Additionally, data persistent software systems commonly known as XML databases allow the retrieval and storage of data in XML structure. Since these are document-oriented databases, they are in turn part of NoSQL databases. Application of XML format has increasingly proved beneficial for data storage and data transfer. While it is in XML format, data is easily manageable and efficient. XML-enabled databases offer a single or multiple ways of storing XML within the traditional relational structure and are ideal where most of the data are non-XML: · Stowage of XML documents into a Character large object or CLOB · Divides XML documents into arrays of Tables based on Schema · Saves XML into a native XML Type as defined by the ISO Native XML databases, in contrast are better suited for data composed mostly of XML. Native XML databases offer the advantage to preserve physical structure and comments; allows storage of documents without knowledge of the schema, and accessed using XQuery, XPath, or the DOM. In addition, Native XML databases use specific APIs (XQJ or XML:DB API) (Bourret, 2015). Native XML databases fall into two major categories: Document-based storage and Node-based storage. Various native XML database exist in the market, a consolidated list can be seen in Wikipedia http://en.wikipedia.org/wiki/XML_database. ' Extensible Markup Language (XML)' Extensible Markup Language or XML is derived from Standard Generalized Markup Language (ISO 8870:1986). XML allows data processing with relatively little human interaction due to its textual format. XML has remarkably taken off, and is used very commonly in sharing information across networks and applications. XML has prominently succeeded in applications such as podcasting, data storage, and Web services. XQuery XQuery is a case sensitivity programming language that facilitates the extraction of data written in XML format from various XML information sources, including both database and documents. XQuery was created to satisfy the different requirements and Use Cases outlined by the World Wide Web Consortium. XQuery is structure from the compiled manifestation of lower-case characters and not reserved keywords, symbols, and operands. This XML technology is a language that queries data whether “physically stored in XML or viewed as XML via middleware” (W3C, 2010). XQuery ideally queries XML documents or a mixture of XML and relational sources where SQL retrieves data from relational databases (Jackson, 2005). XQuery possesses basic data types similar to XML Schema. - Numbers, including integers and floating-point numbers. - Boolean values: true and false. - Strings of characters, for example: "Hello world!” No character in a string can altered - Data types that denote year, month, day; hours, minutes, seconds, and intervals. - Lexical QName – encompasses a namespace prefix (optional) and a local name detached by a colon. QNames follow the syntax structure of XML qualified names. XQuery implements node values to represent XML values. XML values include: element, attribute, namespace, text, comment, processing-instruction, and document (root) nodes. Various basic XQuery functions can create or return nodes. One instance is the doc () ''function it can return a document root node after recognizing an XML file identified through a URL argument. XQuery expressions can be embedded inside element constructors with the use of curly braces {}. Expressions are embedded utilizing template processors such as JSP, ASP, and PHP into HTML content. Additionally, forms of XML/HTML can be embedded inside expressions. The syntax snippet below shows the implementation of curly braces: let $i :=2 return let $r :=Value return {$r} of 10*{$i} is {10*$i}. Creates: Value of 10*2 is 20. XQuery expressions evaluate sequence of simple values (atomic values and node values). However, sequences cannot be nested. The relationship between XQuery expressions is very similar to those XPath expressions. XQuery derived from Quilt a query language that used various features from other languages including XPath. A variation among XPath and XQuery is the fact that XPath expressions may return a node set while the same XQuery expression returns a node sequence. In addition, XPath expressions are commonly implemented as patterns in XSLT stylesheets. XQuery functions comprises of: - built-in functions - ''fn:prefix( ) - function calls - {upper-case($booktitle)} - User-Define functions – created by the user.(w3schools, 2015) Code Snippet, 1. declare function local:minPrice($p as xs:decimal?,$d as xs:decimal?) 2. as xs:decimal? 3. { 4. let $disc := ($p* $d) div 100 5. return ($p - $disc) 6. }; 7. Below is an example of how to call the function above: 8. 9. {local:minPrice($book/price,$book/discount)} Sorting and context are used to sort sequence using the sortby expression. The sortby expression accepts the input sequence ($movies) and one or more ordering expressions; ''it compares two of the value input from the sequence and determines which is first. It is accomplished by evaluating the ordering expression in the context of a value from the sequence of entry. So the path expression ''title/producer is evaluated multiple times a different movie as the context item. Syntax: $movies sortby (title/producer) XQuery, like Java, C#, and other languages, it is a mix of static typing and dynamic typing, however, the types are different. XQuery types match its data model and allow the importation of types from XML Schema. if ($child instance of element section) then process-section($child) else ( ) {--nothing--} Processing Model The logical structure of an XML document rather than its surface syntax and the expression context defines XQuery. The schematic representation shows the external processing, query processing, and Schema import processing below. The external processing domain incorporates production of an XDM instance that represents the data needed querying, the schema import processing, and the serialization process. The query processing domain comprises of the static analysis and dynamic evaluation phase. Data Model Generation (DM) ' For a query to take place, the input data must first be exemplified as an XDM instance. To attain this, the code must first be compiled by using an XML parser and validated using one or more schemas. The generated Post-Schema Validation Infoset can transform into the XDM instance by way of the XQuery and XPath Data Model. This process takes place outside the domain of XQuery. '''Schema Import Processing (SI) ' The XML schema or any other mechanism being utilized can generate the in-scope schema definition residing on the static context. However, it must comply with the consistency constraints. 'Expression Processing ' XQuery defines two phases of processing; these are static analysis phase and dynamic evaluation phase. '''Static analysis phase (SQ) – based itself on the Unicode and the static context rather than on the input data. During this stage, - The query parses into the operation tree - The implementation initializes the static context and then changes it and augments it based on information in the prolog - The Schema Aware Feature allows the query Prolog to contain a schema import, if supported, the in-scope schema definition are populated with information from imported schema. The Module Feature, when backed by a conforming XQuery implementation extends the static context function declarations and variable declarations from imported modules. - Implementation of the static context resolves schema type names, function names, namespace prefixes, and variable names. - Normalization of the operation tree is possible once an operation such as atomization and extraction of Effective Boolean Values takes place. Dynamic evaluation phase (DQ) '– This phase starts once static analysis phase ends and in this stage value computation is carried out. - Relies on the process hierarchy of the expression being evaluated, on the input data, and on the dynamic context - Extracts information from the external environment and the static context - It may create new data-model values, and it may extend the dynamic context '''Serialization(DM) ' Serialization carries out the conversion of XDM instance to a sequence of octets. 'Consistency Constraints ' To adequately define XQuery 3.0, the input XDM instance, the static context, and the dynamic context must be equally consistent. '''Glossary Atomization Data model Dynamic context Effective Boolean Values Expression context in-scope schema definition Module Feature Schema Aware Feature XDM instance XML Infoset XQuery and XPath Data Model' Data Storage Not so many years ago, one could store data in in only one of two ways: an efficient tabular form (relational databases such as Oracle or SQL) or in XML which more easily handles nested data and text documents. Fortunately, we now have hybrids of these two systems which gives us the advantages of both systems. Extracting Data from XML – The Data Models XML is a text based language. Once XML is read into memory, however, it is typically represented as a tree. There are numerous ways to represent and access these trees which are all very similar, but which have differences, including diverse limitations and advantages. Three of the primary models include document object model (DOM), XPath Data Model (XDM), and Post-Validation Information Set (PSVI). *DOM forms the basis of many other powerful implementations. Each item in an XML document (element, attribute, blocks of text, etc) is represented as a node with dependent items drawn as child nodes. There are different types of nodes (element node, attribute node, etc.) which have different properties. When accessed, the low level DOM application programming interface (API) returns a node list. DOM does have its limitations though. For one, it is quite verbose when compared to other applications. ** Sample XML text and DOM tree: * The XML Path Language (XPath), while not a standalone language in its own right, is designed to be embedded in and used by multiple other languages such as XSLT, XLink, XPointer, and XQuery. It can be used from PHP, Python, C, C++, Java, and a host of other languages. Early XPath (XPath 1.0 which is still widely used) utilizes DOM APIs to return a node list. It is an easy and intuitive method for finding items in XML trees. It is the basis for XSLT and XQuery 'XPath' Xpath (XML Path Language) is designed to select parts of an XML document via the use of an expression. While XPath is a complete language on its own, it was designed to be hosted by in another environment and is typically embedded in such language as Python, Java, C, C++, Schema, XSLT, XQuery, and many more. XPath essentially evaluates XML trees or DOM node lists searching for specified items. The XPath expression essential ‘navigates’ through the node tree evaluating each node or ‘context item’ until it finds, or fails to find, the item requested. While XPath is a quick, powerful way to refer to portions of an XML document, it also forms the basis of more complex processes such as XSLT and XQuery. A simple example of an XPath query using the sample above would be: /entry/body/p/born This would return Leonard Nimoy’s date of birth. Another way to use XPath is to use predicates. Predicates can be specified values or positional values. For example, in the sample above, assuming that the entry id was one of many famous people in a book, one could use the following expression to find the entry for Leonard Nimoy: /book/entry= “nimoy-leonard” Alternately, if the book of famous people was divided into chapters (Chapter 1: Politicians, Chapter 2: Scientists, Chapter 3 Actors, etc.) one could use the following positional search to find the chapter on actors: /book/chapter= 3 The basic syntax of XPath is quite simple: Navigating from parent node to child node requires a “/” while evaluating an attribute requires “@.” XPath can become a bit complex, utilizing operational expressions, equalities, different literal values and types, and even expressions such as “for,” “let,” “if,” “some,” and “every!” Add to this the ability to cast values from one type to another, string functions, numeric functions, and even the ability to define your own functions gives this language a powerful diversity to handle nearly every situation. XSLT Extensible Stylesheet Language Transformation (or XSLT) is a declarative language used to transform an XML document into text, HTML, or a different XML document. The fact that XSLT is declarative and functional language , rather than procedural like Java, C# and other programing languages means that you specify what should be done in certain circumstances and the processor decides how it should be done. The backbone of XSLT is the template. The template is matched to an XML document and produces output based upon its processing. A stylesheet is a comprehensive document which holds all the templates. XSLT uses XPath to select nodes and extract data. XSLT does not change the original document, but rather, it creates a new document in the specified form based on the selected data from the existing XML document. XSLT has seen many improvements since its inception in the late 90’s. XSLT 2.0 and even 3.0. The 2.0 version included additional functions which improved text handling such as tokenize() and . The newest verison, 3.0, is slated to include string evaluation and improved error handling with Java-like / ability. ''''' Category:Browse